Spam Filtering Based on Supervised Latent Semantic Features Extraction
نویسندگان
چکیده
Spam text is an universal phenomenon on the “open web”, including large-scale email systems and the growing number of Blogs. Handling this information overload is becoming an increasingly challenging problem, A promising approach is the using of content-based filtering. In this paper, our focus is placed on finding effective dimension reduction method for email Spam filtering, we apply a supervised latent semantic features to extract the features according to their filtering powers. Using Support Vector Machines and k-Nearest Neighbors classification model, Experiments on TREC 2005 shows that supervised latent semantic features extraction can improve the junk mail detection performance .
منابع مشابه
Spam Filtering Based on Latent Semantic Indexing
In this paper, a study on the classification performance of a vector space model (VSM) and of latent semantic indexing (LSI) applied to the task of spam filtering is summarized. Based on a feature set used in the extremely widespread, de-facto standard spam filtering system SpamAssassin, a vector space model and latent semantic indexing are applied for classifying e-mail messages as spam or not...
متن کاملContent-Based Spam Filtering on Video Sharing Social Networks
In this work we are concerned with the detection of spam in video sharing social networks. Specifically, we investigate how much visual content-based analysis can aid in detecting spam in videos. This is a very challenging task, because of the high-level semantic concepts involved; of the assorted nature of social networks, preventing the use of constrained a priori information; and, what is pa...
متن کاملSpam Filtering using Contextual Network Graphs
This document describes a machine-learning solution to the spam-filtering problem. Spam-filtering is treated as a text-classification problem in very high dimension space. Two new text-classification algorithms, Latent Semantic Indexing (LSI) and Contextual Network Graphs (CNG) are compared to existing Bayesian techniques by monitoring their ability to process and correctly classify a series of...
متن کاملA Novel Method of Text Clustering for Chinese Spam Based on Semantic Body
The effect of spam filtering method based on statistics is not good in filtering the new-type spam with synonymous substitution and camouflage. So a new text clustering method based on Semantic Body for filtering Chinese spam is proposed. In this paper, the word sense disambiguation, lexical chain based on HowNet and statistic-based TFIDF are adopted to extract features of mails. The Semantic B...
متن کاملApplication of Refined LSA and MD5 Algorithms in Spam Filtering
The paper proposes a spam filtering method that uses integrated and refined Latent Semantic Analysis (LSA) and Message-Digest Algorithm 5 (MD5) algorithms to address a series of universal problems in spam filtering, including remarkably lowered filtering precision and notably unbalanced filtering efficiency as a result of lack of latent semantic analysis of mail contents. In introducing LSA, it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008